Ozone as an environmental driver of influenza

Under long-standing threat of seasonal influenza outbreaks, it remains imperative to understand the drivers of influenza dynamics which can guide mitigation measures. While the role of absolute humidity and temperature is extensively studied, the possibility of ambient ozone (O3) as an environmental driver of influenza has received scant attention. Here, using state-level data in the USA during 2010–2015, we examined such research hypothesis. For rigorous causal inference by evidence triangulation, we applied 3 distinct methods for data analysis: Convergent Cross Mapping from state-space reconstruction theory, Peter-Clark-momentary-conditional-independence plus as graphical modeling algorithms, and regression-based Generalised Linear Model. The negative impact of ambient O3 on influenza activity at 1-week lag is consistently demonstrated by those 3 methods. With O3 commonly known as air pollutant, the novel findings here on the inhibition effect of O3 on influenza activity warrant further investigations to inform environmental management and public health protection.

activity within 2 weeks in the community when keeping all other variables the same.
The totality of evidence has been proposed as a way forward to strengthen causal inference with observational data 14 : cross-checking multiple methods can enable more robust causal interpretation that is supported by multiple algorithms and conceptual principles but contradicted by none.Here, we propose an integrative methodological framework with three distinct approaches in examining the effect of ambient O 3 on influenza dynamics in the USA.Namely, (1) Convergent cross mapping (CCM), a causality test kit based on state-space reconstruction (SSR) for dynamical systems 15,16 , (2) a graphical modeling approach called Peter-Clark-momentary-conditional-independence plus (PCMCI+) 17,18 , and (3) a statistical regression method Generalized Linear Model (GLM) 19 , are used in our study.Since these methods are endowed with disparate theoretical assumptions and hidden biases, we envision that the confidence to make causal inference regarding the scientific question of interest would be strengthened if consistent findings are reached 14,20 .

Dynamic data of environmental variables and influenza
Weekly time series of environmental variables (namely, O 3 , AH, and T) and influenza activity ("Flu") in the USA during October 3, 2010 to May 31, 2015 are depicted in Fig. 1. "Flu" is calculated by taking the product of two proportions: the proportion of influenza-like-illness (ILI) cases among all clinical visits in the community and the proportion of influenza-positive specimens, being an arguably good proxy measure of influenza activity in the community (see details in the "Method" section).Seasonality is observed in all the time series, with influenza activity showing winter peak but summer trough, and environmental variables showing the opposite to certain extent.There are long stretches of 0 values in the influenza time series, especially outside of influenza season, which contain little causal information for exploration.As a consequence, this study only focused on the influenza season (that is considered October through May) in the USA for analysis.The weekly mean level of influenza activity is 4:25 × 10 À3 , which can be understood as 425 expected cases per 100,000 population.

Environmental drivers of influenza
The estimated effects of environmental factors on influenza activity in the USA using three methods (namely, CCM, PCMCI+, and GLM) are summarised in Table 1.While these methods adopt different measures of effect size, they share one common interpretation rule: a negative value indicates a negative effect size, and vice versa.The negative effect of 1-week lagged AH on influenza activity was detected by PCMCI + and GLM but not CCM, and the negative effect of 2-week lagged air T on influenza was detected by GLM alone.Ambient O 3 was found to reduce influenza activity (p<1:0 × 10 À3 ) at lag 1 (week), consistently by three distinct methods.
CCM under the umbrella of Empirical Dynamic Modeling (EDM) approach was used to conduct the causality test in dynamical systems 15 .The intuition behind CCM is to examine how well a hypothesized driving variable can be cross-mapped (cross-predicted) by the effect variable given putative causal information injected.And such cross-mapping should perform better as more data points (i.e., larger library size) are available to construct the attractor, showing a convergence property, thus the name Convergent Cross-Mapping (CCM) Fig. 1 | Weekly time series of environmental measurements and influenza activity in 46 states of the USA during 2010-2015.Influenza activity is indicated by the variable "Flu" which is the product of two proportions: the proportion of influenza-like-illness cases among all clinical consultations in the community and the proportion of laboratory confirmed influenza-positive specimens among all specimens tested.Long stretches of 0 values for "Flu" in non-influenza season (June to September), shaded in gray columns, contain little information for causal inference and thus are omitted from data analysis.
for causality test (see details in the "Methods" section).Here, the CCM skill was calculated as the improved prediction accuracy obtained at the maximum library size over that at the minimum library size (Δρ CCM = ρ maxLib À ρ minLib ).To rule out the influence of shared seasonality between time series, the CCM skill (Δρ CCM ) of observed time series was tested against surrogate data.Figure 2a presents the statelevel CCM surrogate tests on whether there is a causal effect of O 3 , AH and T at 1-week lag on influenza activity; Fig. 2b presents a summary of measured Δρ CCM of each state and the nationwide meta-significance estimates by summing the logs of state-specific CCM test p values.While the state-level relationships are variable and all three environmental factors could drive influenza activity in some states (as signified by filled red dots), the nationwide test indicates that O 3 alone is an environmental driver of influenza activity at the significance threshold of p meta <1:0 × 10 À3 ; moreover, CCM skill for O 3 is significantly greater than that for AH (p<2:8 × 10 À5 ) and T (p<4:3 × 10 À5 ) by Wilcoxon rank sum test.When repeating CCM in the nonsensical causal direction by setting the candidate cause and effect in reverse, none of the CCM results is significant (see Supplementary Fig. S1), addressing the concern of spurious relationship generated by noncausal synchrony.
Following CCM causality test, multivariate S-map (sequential locally weighted global linear map) is conducted to quantify the effect magnitude of putative environmental drivers on influenza activity 16,21 .Figure 3 plots the state-specific median effect size of O 3 , at 1-week lag, on influenza activity onto the map, showing a mostly negative sign across the states.The median effect size is −0.106; that is, one SD increment in O 3 (8.3ppb) leads to a 0.106 SD decrease in logittransformed influenza activity in the following week.All states see negative effect size except for the two states of Mississippi and Hawaii though they do not pass the CCM causality tests (Fig. 2a).The effect estimates of AH and T on influenza are also negative, but they fail to pass the CCM causality tests (Table 1).
To cement the credibility of making causal inference, a probabilistic graphical modeling framework called PCMCI+ (see detailed description in "Methods" section) is adopted to estimate the causal networks of the underlying system 17,18 .The output of PCMCI+ reads as a directed acyclic graph (DAG) which can be interpreted as random variables (in nodes) linked by causal dependencies under certain assumptions (Supplementary Table S1).Figure 4a shows the state-bystate dependency networks.Eighteen out of 46 states see a direct link from O 3 to influenza activity, of which 17 are of a negative sign and 1 is positive for North Dakota.Thirteen states see direct and/or indirect links from AH to influenza activity: the effect sign is mixed but largely negative.Fifteen states see negative link from T to influenza activity, of which 10 links are direct, but 5 are indirect through O 3 .Here, the hyperparameter significance level (α PC ) for iteratively filtering out spurious links was set as 0.05 at the state-level analyses.
Figure 4b shows a summary dependency graph estimated by concatenated time series from individual states; a more stringent criterion α PC of 1:0 × 10 À3 was used to yield the nationwide graphical output.Ambient O 3 bears a direct negative effect on influenza activity at lag 0 (i.e., within the same week) and a lag of 1 week; air T affects influenza activity in a negative manner indirectly through O 3 at lag 0. With respect to AH, its total effect on influenza activity is obscure: aside from a direct negative coupling at lag 1, it also has indirect but positive effect through O 3 .AH and T are strongly coupled with each other, as expected.Supplementary to the above two causal discovery methods, customary regression method of GLM is conducted at each state to analyze the relationship of environmental factors with influenza activity, adjusting for secular trend, seasonality, as well as inherent autocorrelation.Figure 5a indicates that 1-week lagged statistical associations between each environmental variable and influenza activity are mixed at state level.The state-level regression coefficients are then pooled using a meta-analysis model (Fig. 5b).One SD increment in O 3 concentration is associated with a reduction of 0.102 (CI: −0.186, −0.018; p<5:9 × 10 À5 ) in logit-transformed influenza activity 1 week after (Table 1).Meanwhile, AH and T are also negatively associated with influenza activity at lag 1 and lag 2, respectively (Table 1).

Discussion
This study made use of the weekly time series data of influenza and environmental variables in the states of USA during 2010-2015 and three distinct methods for dynamic data analysis (namely, CCM, PCMCI+, and GLM) in order to provide more reliable answers to the question on environmental drivers of influenza.Three sets of results consistently demonstrate the negative impact of ambient O 3 on influenza activity in the community.
Hitherto, a limited number of population-level studies have examined the relationship between ambient O 3 and influenza, and the findings have been mixed.The integrated assessment of O 3 by the USEPA 22 cited two references that reported positive associations of O 3 and influenza in Hong Kong and Brisbane 23,24 , respectively.The report in Hong Kong, however, was not actually on the relation of O 3 and influenza specifically; rather, that report aggregated influenza and pneumonia into one group, which was associated with environmental O 3 .The other report on the positive association of O 3 with pediatric influenza in Brisbane did not strenuously control for potential temporal confounding in the time-series analysis.A more recent time series analysis using Hong Kong surveillance data during 1998-2013 demonstrated that ambient O 3 is negatively associated with reduced influenza transmissibility (i.e., real-time effective reproduction number, Rt) 8 .The current study based on the data in the USA, demonstrates again an inhibiting effect of O 3 on community-level influenza activity.
One explanation of the O 3 inhibition effect on influenza could relate to its direct virucidal potential.O 3 inactivation of influenza virus has been reported in studies in vitro.Influenza virus (WSN strain) suspended on a thin film was inactivated within a few hours by an O 3 concentration of 160 ppb (342 µg/m 3 ) 25 .In mice studies, however, O 3 exposure at 500 ppb appeared to have no effect on pulmonary virus In all three sets of results, bold values suggest relationships with statistical significance of p<1:0 × 10 À3 .In CCM, causality test against 1000 seasonal surrogates was first performed for each state and those p values were then pooled to obtain meta-significance for nationwide results, while effect was estimated by multivariate S-map analysis.In PCMCI+, state-level data were pooled to obtain one set of results for the nation, and effect is measured by momentary conditional independence (MCI) test with partial correlation method.In GLM, regression with logit link function was first performed for each state, the coefficients of which were then pooled by meta-analysis for nationwide results.CCM convergent cross mapping, PCMCI+ Peter-Clark-momentary-conditional-independence plus, GLM generalized linear model.
titers; rather, O 3 diminishes the severity of influenza virus infection and lung injury evidenced by less widespread infection in the lung parenchyma 26,27 .In the current study, the average level of daily maximum 8-h O 3 is less than 40 ppb in the USA.At this low ambient level of O 3 , O 3 -primed host immunity against influenza infection constitutes a more plausible explanation of the population findings presented here.Inhaled ambient O 3 primes the pulmonary immune system boosting allergic responses in healthy and susceptible populations 28,29 .Following O 3 exposure, a myriad of immune responses is triggered, and multiple interleukins (IL) are released from epithelial cells, macrophages, and other myeloid cells 30 .Among them, IL-33, acting as an endogenous "alarmin" in response to airway barrier damage incurred by O 3 31,32   , is endowed with pleiotropic and homeostatic functions orchestrating airway injury and repair 12,30 .Likewise, IL-33 is also highly expressed following invasion of influenza virus, playing a pivotal role of dynamic immune modulator during the course of infection 33,34 .It is plausible that O 3 -induced IL-33 in the cytokine milieu is involved in an immune crosstalk assisting human defense against influenza.
In the setting of inflammation combating foreign antigen, overexpressed IL-33, signaled via its receptor ST2, can be redirected from the default type 2-inducing capacity to augment type 1 immunity, amplifying antiviral CD8+ T cell and natural killer cell responses 13,35,36 .In mice models of influenza infection, exogenous IL-33 inoculation could enhance recruitment of dendritic cells (DCs), increase secretion of pro-inflammatory cytokine IL-12, and prime cytotoxic T-Cell responses, facilitating viral clearance 37 .IL-33 may protect against influenza by orchestrating Th1/Th2 paradigm and so maintaining a fine balance of pro-inflammatory pathogen clearance and antiinflammatory tissue repair 38,39 .During the resolution phase of infection event, IL-33 acts on residential ST2-expressing group 2 innate lymphoid cells (ILC2s) as well as regulatory T (Treg) cells to restore airway tissue homeostasis, mediated at least partly by amphiregulin (AREG)-dependent repair of virus-damaged epithelium 40,41 .
The hypothesis of O 3 -elicited IL-33 conferring cross-protection against influenza gains strength further from the evidence of its promising role as a mucosal vaccine adjuvant 42 .Exogenous IL-33 co-  administered intranasally with recombinant influenza A hemagglutinin (rHA) induced significantly higher antigen (Ag)-specific plasma immunoglobulin G (IgG) and mucosal IgA antibody (Ab) levels as well as enhanced production of both Th1 and Th2-related cytokines, all of which resulted in better protective capacity of the vaccine 43 .Besides, endogenous IL-33 release, within 24 h, after administration of alumadjuvanted nasal influenza vaccine, induced higher IgA Ab production via enhancing Ag presentation on DCs and promoting ILC2 activation 44 .These findings allude to possible parallels between adjuvanticity of nasally administered alum and ambient O 3 exposure.This study strives to triangulate the evidence by integrating results from different research approaches with distinct theories and working hypotheses.Under the combo methodological framework, CCM employs the idea of SSR for attractors of deterministic dynamical systems thereby addressing better nonlinear state-dependent couplings, but it is less well suited for time series of purely stochastic nature that is better handled with PCMCI+ method 45 .By contrast, PCMCI+ builds on assessing (conditional) probability distribution of random variables, and thus lacks power to recover the non-separable couplings in deterministic systems 46 .On the other hand, the difficulty of PCMCI+ in handling latent (hidden) variables can be addressed in the state-space based method of CCM 47 .The qualitative dependency structure revealed by PCMCI+ can then be complemented by quantitative risk estimates in the time series regression method of GLM by properly controlling for confounding factors.
A few limitations and caveats of the current study require due consideration.The first concern derives from its reliance on passive surveillance data, which can be subject to measurement error.For example, surveillance practices to identify laboratory-confirmed influenza cases as well as healthcare-seeking behaviors due to influenza-like-illness (ILI) can vary across states and years, which may hamper our ability to accurately estimate the actual influenza activity.However, by utilizing a comprehensive proxy measure that combines the laboratory and clinical data, such issue has been minimized to our best effort 48 .Secondly, with the scarcity of virologically confirmed subtype data, we aggregated influenza cases across all influenza subtypes to reduce the number of missing values.Since lineage-specific differences might exist in the effects of O 3 and other climatic factors, future studies are warranted to integrate the subtyping and antigenic information into analysis.Thirdly, note that our findings are generated from publicly available state-level weekly data over a time span of 5 years.It remains an important topic for future studies to decode the nuanced relationships of environmental variables with influenza activity on finer spatial and temporal scale, since factors such as demographic features, social connectivity, tourism activities (e.g., Hawaii), as well as public health interventions can lead to fundamentally different base transmission potential, which may interact with environmental factors to shape the complex influenza dynamics 4,49 .
In closing, this study reveals a negative impact of ambient O 3 on community-level influenza activity by triangulating evidence derived from distinct data analysis approaches.Our finding warrants more laboratory or molecular studies to corroborate the mechanisms shaping the observed causal link in the population, so as to better inform environmental management for public health protection.Moreover, we hope that this work, through a novel integration of divergent analytical frameworks, will catalyze further coordinated efforts in causal discovery using observational dynamic data.

Influenza data
We retrieved state-level weekly laboratory confirmed influenza data from the USA Center for Disease Control and Prevention (CDC) website during the period from October 3, 2010 to September 27, 2015.The counts of laboratory positive cases for influenza (by type A and type B) were reported weekly by designated laboratories located in each state through the platform of World Health Organization (WHO)/ National Respiratory and Enteric Virus Surveillance System (NREVSS)  Collaborating Labs.We also retrieved the weekly reported data on medically attended visits for ILI from CDC for each state during our study period.Over 3000 outpatient healthcare providers throughout the country reported the number of patients with ILI and number of total patient consultations via the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet) each week.
Due to limited testing capacity in each state, laboratory surveillance data are unable to fully represent influenza activity in the population.Fortunately, ILI data from sentinel outpatient clinics possibly cover a wider spectrum of community influenza cases, despite lower diagnostic specificity.In our study, we incorporated the information from these two available sources to quantify the community-a b level influenza activities.We computed the representative proxy of influenza activity (denoted as "Flu") in community by multiplying the weekly proportion of ILI outpatient consultations with the weekly proportion of specimens tested positive for influenza.That is, Flu proxy = ðFlu positive =Flu total Þ*ðILI outpatient =Outpatient total Þ.This proxy measure, developed and used by epidemiologists to minimize underreporting bias in the laboratory-confirmed influenza data as well as to address unobservability of infection in the ILI data, has been validated as a better linear correlate of influenza virus infections in the community population [50][51][52] .

Environmental variables data
To explore the proposed environmental drivers of influenza activity, we obtained ambient O 3 concentration data in the USA for each state from the Tropospheric Ozone Assessment Report (TOAR) of the International Global Atmospheric Chemistry (IGAC) 53 .Specifically, station-based daily maximum 8-h average O 3 levels (ppb) were extracted and averaged by state for analysis.State-level weather data were collected from the National Center for Environmental Information (NCEI), the National Oceanic and Atmospheric Administration (NOAA).During the extraction of environmental variables, a centering approach was used to reduce the station-related measurement bias 54 .That is, raw environmental measurements were centered with respect to their long-term station-specific average and then added with the state-wide average.The daily averages of air temperature (°F) and dew point temperature (°F), computed from hourly land-based station observations, were extracted and converted to Celsius values.Daily absolute humidity, a measure of water vapor density in the air (g=m 3 ), was calculated from the dew point and air temperature, following standard meteorological formulas 55 .The daily O 3 and weather time series were further aggregated into week to match the resolution of influenza data.

Statistical analyses
There are many long stretches of 0 values in the influenza time series, especially outside of flu season, which contain little or very limited  causal information about the environmental factors.Consequently, this study only focused on the flu season (that is considered October through May) in the USA for analysis 56 .States which have at least 3 consecutive years of available influenza data are included.Out of the 50 states, 46 were finally eligible for study; states of Vermont, Rhode Island, New Jersey, and Florida were excluded.
The analytical framework in this study consists of three distinct but complementary methods: CCM, PCMCI+, and GLM, to strive for detection of causal links notably by evidence accumulation (Please refer to a tabulated summary of these three methods in Supplementary Table S1).Firstly, the SSR-based CCM method was applied to examine the existence and strength of causal imprint (as defined in dynamical systems) of candidate drivers on influenza activity, followed by quantification of the effect size using S-map.Secondly, graphical modeling method PCMCI+ was conducted to depict the causal dependency structure (detailed assumptions are explained below and in Table S1) between environmental variables and influenza activity by a DAG 57 .Finally, traditional time series regression GLM was applied to estimate the statistical association of each environmental variable with influenza activity while controlling for the other two simultaneously.To explore the temporal structure of dependency, we tested multiple time lags from 0 to 2 with weekly data.Our analyses were conducted in the R (version 4.1.1)and Python (version 3.8).The "rEDM" package (version 1.9.2) 58 , a collection of methods for EDM, was utilized to generate the CCM and S-map results.The "Tigramite" package (version 4.2) was harnessed to complete the PCMCI+ dependency network modeling 17 .The "mgcv" (version 1.8-37) 59 was adopted to fit the quasi-Binomial regression model.Nationwide map for visualizing statespecific S-map effect estimates was produced using "usmap" R package (version 0.6.1) 60.

CCM
Given dynamic data, CCM which harnesses the technique of SSR for dynamical systems 47 was introduced to evaluate the potential driving role of ambient O 3 , AH and T on the influenza activity.In the dynamical system theory, as the system state evolves over time, its motion can trace out a trajectory, which collectively form a geometric object often called "attractor manifold", in the multi-dimensional coordinate space whose axes are the set of causally relevant variables such as humidity, O 3 concentrations, infection rates, and so forth.Therefore, time series data of observed variables can be simply comprehended as projection of the whole system dynamics onto certain axis.This underpins the SSR technique following the basic tenet of Takens' Theorem 61 ; that is, the original multivariate manifold (M) can be reconstructed using just one of the system variables (e.g., X t ), by taking its delayed coordinates (i.e., embedding) with a time lag τ: where E is the embedding dimension that can sufficiently "unfold" the dynamics of system manifold so that reconstructed states on shadow manifold M X map 1:1 to the original states on M 15 .
As one of the corollaries to Takens' Theorem, CCM is the causality test kit (from the aspect of dynamical systems) in the EDM framework proposed by Sugihara's research group in 2012 15 .This method assumes low-dimensional deterministic system with limited stochasticity 45 .The basic idea is that if variable X has a causal influence on variable Y , then the driven time series Y t with enough delayed embedding (i.e., reconstructed manifold M Y ) should contain the necessary dynamics information to recover or cross-predict the current values of X t , but not vice versa.This practice of using the response variable Y to forecast the causative variable X seems counterintuitive, but it has been well illustrated with algebraic equations by Sugihara's group 15 .The underpinning algorithms of CCM are built on simplex projection 62 .Given the shadow manifold M Y , the E + 1 nearest neighbors of y t which correspond to similar system states sharing evolving patterns are first selected.Next, the time indices of these neighboring points of y t are adopted to locate the corresponding points in M X (a putative neighborhood of the predictee).Then, a locally weighted average of the E + 1 values of X is calculated to predict the cross-mapped estimate of xt .Here, the weights are assigned based on the Euclidian distance from y t to its each nearest neighbor on M Y .The value of E was chosen over the range of 2 to 6 where the maximum of univariate is achieved via leave-one-out cross-validation (Table S2).The lower limit of = 2 was specified in order to embed at least one external variable to reconstruct multivariate manifold; the upper limit of E = 6 was specified because the maximum E should not be larger than the square root of the consecutive time series data length 63 , which was 35 or 34 weekly data points in influenza season in the current study.
After the cross-mapping is done, we can evaluate the accuracy (i.e., predictability) by the correlation coefficient (ρ) between the predicted and observed values of X series.As the number of data points used for prediction (that is, library size, L) becomes larger, the reconstructed shadow manifold M Y will become denser, and the closer nearest neighbors will accordingly lead to lower estimation error (i.e., higher ρ).Such behavior is referred to as "convergence" and is generally utilized to distinguish true causality (as defined in dynamical systems) from simple correlations 15 .Here, we compared the crossmapping skill (ρ) obtained by the maximum library (i.e., the whole data length) to that obtained by the minimum library (i.e., E + 2 data points allowing for simplex projection), and quantified the convergence property of cross-mapping as Δρ CCM = ρ maxLib À ρ minLib .In the vein of Deyle et al., shared seasonality of environmental exposures with activity is another ponderable issue in this context 7 .preclude spurious CCM results as an artifact of mutual seasonal forcing, we generated an ensemble of 1000 surrogates with randomized seasonal anomalies for the putative cause time series 7 .Consequently, a null distribution of CCM skill (Δρ CCM ) using surrogate time series was formed.As a test of statistical significance, the cross-mapping skill obtained for the original time series should exceed the 95th percentile of the null distribution built by seasonal surrogates (i.e., at the α<0:05 level).Then, the classical Fisher's methods (that is summation of logs of individual p values) was applied for meta-significance test for all the states 64 .To combat the anti-conservativeness of meta-significance p value estimate, we used a stringent significance threshold of α as 1:0 × 10 À3 .We also repeated CCM analysis setting the candidate cause and effect in the nonsensical reverse direction (i.e., to test whether drives environmental factor) to address the concern of synchrony yielding spurious covariation 65 .After qualitative causal relationship was tested, multivariate S-map technique, a method also packed in EDM, was used to examine how and to what degree the putative environmental driver (e.g., O 3 ) influences the influenza activity, thereby quantifying the effect size 21 .Unlike simplex projection using just nearest neighbors, S-map procedure uses all available data points (thus "global") in the library to fit local linear regressions at each successive point along the manifold attractor.Through including a nonlinear localization parameter θ, S-map controls the weighting assigned to each point, thereby tuning how strongly the regression is localized to the target states 66 .The θ was chosen over the range [0.01, 9] that maximizes the univariate S-map forecast performance using leave-one-out cross-validation (Table S2).By taking multivariate embeddings (i.e., using putative causal variable in addition to time-lagged vectors of the effect variable itself) for SSR 16,67 , the S-map coefficients could approximate the relevant Jacobian matrix elements (that is, partial derivatives ∂Flu=∂Env) in the dynamical multivariate state space, which ably indicates the dynamically state-dependent effect strength of environmental factors ("Env") on influenza activity ("Flu") 16,21 .To ensure an equal weighting for variables of different scales in the multivariate SSR model, time series are normalized to unit mean and variance before analysis, so the magnitude of EDM effects is in a standardized metric.PCMCI+ PCMCI+, a novel conditional independence-based method proposed by Jakob Runge in 2020 17 , was employed as a complementary tool to recognize the dependency structure between O 3 , AH, T and influenza activity under the graphical modeling framework.Leveraging on the classical PC algorithms (named after the developers Peter and Clark) 68,69 reoriented to observational time series 70 , PCMCI+ consists of two main phases, namely skeleton learning phase and subsequent orientation phase.Beyond that, PCMCI+ optimized the selection of conditioning set in conditional independence tests and well exploited autocorrelation, which was demonstrated to yield much higher detection power and orientation recall with better-controlled false positives.Below is a more detailed introduction of PCMCI+ in plain terms.For a good review and comprehensive understanding of this method including algorithms behind, please see Runge et al. papers 17,18 .
Skeleton phase.Initialized with a complete undirected graph (G) where all nodes (variables) are inter-connected, the goal of skeleton phase is to eliminate the spurious links caused by indirect paths and common drivers via iterative conditional independence tests at some significance threshold α PC .Here, for the sake of interpretability and analytical tractability, we implemented conditional independence test with Partial Correlation method (i.e., "ParCorr" in the kit) which assumes linear dependencies but also suffices when the non-linear links can be linearly approximated 46 .As a tuning hyperparameter in the condition-selection step, the significance threshold α PC was set as 0.05 for state-by-state analysis, while significance was calibrated at a stringent 1:0 × 10 À3 level when analyzing the nationwide graphical structure by concatenating all the state-level time series.
In practice, the skeleton edge removal phase is conducted for lagged and contemporaneous conditioning sets separately.To illustrate, given X j t representing each variable/node in the system or target graph (G), we firstly test its putative driver/parent (denoted as PðX j t Þ) over all the time-lagged variables X i tÀτ , where i,j 2 f1, . . .,Ng and τ 2 f1, . . .,τ max g (here, τ max was set as 2 based on domain knowledge).In graphical terms, a causal link X i tÀτ !X j t will stand if and only if X i tÀτ and X j t are dependent given any set of conditions.Starting with the first iteration (r = 0), unconditional (i.e., bi-variate) independence tests are performed for all the pairs ðX i tÀτ ,X j t Þ and X i tÀτ is removed from the hypothetical PðX j t Þ set if the p value cannot pass the significance level α PC .In each next iteration (r !r + 1), we first sort the PðX j t Þ obtained from last iteration by the absolute value of test statistic, and then select the strongest r + 1 parents as the conditions to conduct conditional independence tests for pairs ðX i tÀτ ,X j t Þ.After another round of screening, the hypothetical PðX j t Þ set for each X j t is further narrowed down and the algorithm will finally converge with no more conditions available for test.In this way, we could identify the lagged potential parents of each variable.
Secondly, to identify contemporaneous potential parents of each X j t , this stage initializes the graph (G) with all contemporaneous variables presumptively linked, together with all lagged dependencies screened from the previous stage.For all the pairs ðX i tÀτ ,X j t Þ (here, τ 2 f0, . . .,τ max g to also examine contemporaneous causal links), momentary conditional independence (MCI) tests are conducted, iterating through all combinations of subset of the contemporaneous conditions (denoted as S).Besides, the sets PðX j t Þ and PðX i tÀτ Þ estimated in the previous step are additionally conditioned on, aiming to account for the common drivers, indirect links, and autocorrelation (i.e., paths blocked) with higher detection power and recall.With similar parents-filtering process at each iteration as the previous step, a skeleton of G with both contemporaneous and lagged links is finally obtained.
Orientation phase.After discovery of the skeleton structure, it is necessary to orient the edges on G to infer directionality of relationship.Assumptions of causal stationarity (that is, X i tÀτ !X j t holds for any t) and time-order cause always precedes effect) are applied to help constrain certain cases and simplify the orientation task.For a lagged or adjacency, time order reveals the directionality without ambiguity.While for contemporaneous links, the orientation process can be divided into two steps including a collider orientation stage followed by additional PC constraint rules.
Based on the collider rule, for unshielded triple structures X i tÀτ !X k t À X j t ðτ>0Þ and X i tÀτ À X k t À X j t ðτ = 0Þ, we firstly conduct MCI test for pair ðX i tÀτ ,X j t Þ conditioning on all possible S, together with their recognized lagged parents (see details above).Second, we store the S when X i tÀτ is independent of X j t .Then we calculate the fraction π of S that contains X k t .Since a collider (i.e., common effect) would falsely open the link between ðX i tÀτ ,X j t Þ (which turns to spurious link) if it were conditioned on, S is not assumed to contain any collider of the pair ðX i tÀτ ,X j t Þ when they are conditionally independent given S. Thus, π could be an indicator of the possibility that X k t is not a collider.Finally, we use the "majority" rule to decide the existence of colliders (see details in paper 17 ).The structure is oriented as a triple of collider when π<0:5, as unoriented when π>0:5, and as ambiguous when π = 0:5.
We further determine link directions for the remaining contemporaneous links with three complementary rules.Rule #1 (R1) is to avoid "colliders", since all the colliders are assumed to be already recognized in the collider-hunting stage.For all remaining unshielded structure X i tÀτ !X k t À X j t , orient it as a chain.Rule #2 (R2) is to avoid "cycles" (assuming no feedback loops in the system; colloquially, a variable cannot be its own descendant), which is a tacit assumption when drawing DAGs.For multi-path motifs including X i t !X k t !X j t with X i t À X j t , orient it as X i t !X j t .Rule #3 (R3) is to avoid both "colliders" and "cycles".structures including X i t À k t !X j t and X i t À X l t !X j t where pair ðX l t ,X k t Þ is independent (i.e., not linked) but pair ðX i t ,X j t Þ is of an unoriented link, then orient it as X i t !X j t .Finally, under the standard assumptions of Causal Markov, (adjacency) Faithfulness, causal sufficiency, and causal stationarity (see detailed explanations in Table S1) 18,68,71,72 , the output of PCMCI+ algorithms can be interpreted as the causal network structure of the system, conveniently depicted by a directed (or partially directed) acyclic graph (DAG) composed of nodes (representing random variables) and directed edges (representing causal relations).Note that contemporaneous links can remain unoriented indicating the Markov equivalence class or due to conflicting orientation rules.The node color denotes the autocorrelation (labeled "auto-MCI"), varying from 0 to 1, at the lag with maximum absolute value.The link color stands for the sign (i.e., negative or positive) and strength of the connection estimated by MCI test (labeled "cross-MCI") varying from −1 to 1. Straight and curved edges represent the contemporaneous and lagged causal links, respectively; if multiple lagged links occur between paired variables, the color of link will embody the strongest one but with numeric labels indicating all significant lags sorted by cross-MCI values.

GLM
We applied a conventional time-series regression analysis using generalized linear model (GLM) to estimate the association of O 3 , AH, and T with influenza activity within each state.Since influenza activity is proportion data, a quasi-Binomial link with logit (i.e., log Y 1ÀY À Á ) function that is arguably a reasonable choice, was adopted 73 .By analogy with GLM regression, the target variable "Flu" (i.e., influenza activity) was logittransformed in EDM and PCMCI+ analyses as well, which could give us an overall interpretation benchmark.Besides, due to exponential transmission pattern of influenza cases, such scale transformation can help discern small differences in influenza activity.To modulate the case when Y takes a value of 0, a random small number (i.e., 25% to 100% of non-zero minimum "Flu" level for each specific influenza season) was added to allow for defined transformation.To filter out the potential confounding effect of unmeasured variables, we included dummy variables for the year to capture the secular trend, and dummy variables for each month of a year to capture seasonality in the model.To account for strong autocorrelation caused by disease transmission, we took the logarithm of 1-week lagged outcome variable (i.e., log Y tÀ1 À Á ) as another covariate in the model, which was provably able to match the likely mechanism better (so named "transmission term") and predict outcomes with reduced residual dispersion 74 .
When estimating the relationship of O 3 with influenza, the sameweek AH and T are simultaneously included in the model as a linear term to control for confounding.The regression coefficient estimates together with their corresponding 95% confidence intervals (CIs) were computed for exposures at lag 0, lag 1, and lag 2 (in weeks), respectively, for each state.The state-wise and lag-specific effect estimates were pooled with a random-effects meta-analysis (using restricted maximum-likelihood estimator for the between-study variance) 75 , with the statistical significance threshold set stringently as 1:0 × 10 À3 .

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 3 |
Fig.3| State-specific effect strength estimates for ozone (O 3 ) affecting influenza activity at 1-week lag.Multivariate S-map technique is used to estimate the effect size, along with CCM test on causality.With normalization of data, the effect magnitude is in a standardized metric.A diverging palette centered at 0 is used to distinguish between positive (red) and negative values (blue).The darker the color, the stronger the effect size in either direction.Four states (FL, NJ, RI, and VT) shaded in gray are excluded from analysis due to influenza data missingness.Map was plotted using "usmap" R package (version 0.6.1)60 .

Fig. 4 |
Fig. 4 | Graphical modeling by PCMCI+ based on dynamic data of environmental factors (ozone [O 3 ], absolute humidity [AH], temperature [T]) and influenza activity (Flu) in the states of USA. a State-specific causal graph estimates.Curved and straight edges represent the lagged and contemporaneous causal dependencies, respectively; the number on the curve indicates a lagged

Fig. 5 |
Fig.5| Effects of environmental factors on influenza activity, at 1-week lag, estimated by generalized linear model (GLM).a State-level point estimates of regression coefficients (β) and their 95% confidence intervals shown as circles and line ranges, respectively (n = 173 weeks of dynamic data); Circles are filled when the p value for statistical significance test is <0.05 (two-sided).b Nationwide point risk estimates with the corresponding 99.9% confidence intervals shown as circles and line ranges, respectively, after pooling state-level coefficients (n = 46).Circles are filled when the p value is <0.001 (two-sided) during meta-analysis.O 3 ozone, AH absolute humidity, T temperature.

Table 1 |
Effects of environmental factors on influenza activity estimated by three distinct methods, based on weekly statelevel data of the USA during 2010-2015